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1. Introduction and Motivation 

Hybrid Monte Carlo (HMC) ^ is the algorithm of choice to generate lattice configurations 
including the effect of dynamical fermions. Nevertheless, the generation of gauge field configura- 
tions at large volumes and light quark masses is still very expensive computationally. 

One principal ingredient of HMC is the molecular dynamics (MD) step, which consists of a 
reversible volume-preserving approximate MD trajectory of t/St steps (with t being the length 
of the trajectory in suitable units, and 8t the step size) followed by a Metropolis accept/reject test 
with acceptance probability min(l,e~ 5ff ) where 8H is the change in the Hamiltonian H over the 
trajectory. 

A molecular dynamics trajectory is not only an approximate integral curve of the Hamiltonian 
vector field H corresponding to H, but is also an exact integral curve of the Hamiltonian vector 
field H of an exactly conserved Shadow Hamiltonian H. The asymptotic expansion of this Shadow 
Hamiltonian in the step size 8x may be computed using the Baker-Campbell-Hausdorff (BCH) 
formula and expressed in terms of Poisson brackets [Q]. 

As a simple example consider a single level Omelyan (PQPQP) integrator ^ 

£/ pQP0P0 (T) = ^S8r e ^8r e (l-2l)S8r e ^8r e ?.SSry^ 

whose shadow Hamiltonian is 

^pqpqpq = H PQPQPQ + ( 6A2 ~ f + 1 {S, {S,T}} + ^^{T, {S,T}}\ St 2 + 0(St 4 ). (1.1) 

Note that we have one free tunable parameter, A, which is often set to some ad hoc value not taking 
Poisson brackets into account [Q]. 



2. Integrator tuning 

We define the difference between the shadow (H) and actual (H) Hamiltonians as AH = H — H. 
We have previously suggested []|] that ^(8H 2 ) m Var(AH), where the right hand side is the variance 
of the distribution of values of AH over phase space. This formula assumes that the trajectories are 
long enough that the end points are more-or-less independent of the starting points, and accurate 
enough that their distribution is still close to e~ H . We can therefore estimate the acceptance rate 
from Var(A//) 

P acc = erfc ^ ^(SH 2 )^ = erfc ^ ^Va^A//)^ (2.1) 

The advantage of using Var(Aff) is that one only needs to measure the Poisson brackets from 
equilibrated configurations. We can thus express P acc as a function of the integrator parameters and 
find their optimal values that maximize the acceptance rate. 

As a simple test, we consider a HMC simulation of two flavors of Wilson fermions at K = 0.158 
and Wilson gauge action at j3 = 5.6 on an 8 4 lattice. We use a single level PQPQP integrator and 
a unit trajectory length, therefore we have two parameters to tune, namely A and the step size 
8t. In Figure [l] we compare the acceptance rates predicted by the formula above (red curve) with 
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dt 



(a) Acceptance rate as a function of X, with St = 0. 1. (b) Acceptance rate as a function of St, with A = 0.18. 

Figure 1: Comparison of measured acceptance rates and their predictions from average Poisson brackets. 



numerical data taken from simulations at various values of A and St (black dots). The Poisson 
Bracket values used for the predictions were measured at A = 0. 18 and 8t = 0.1. In Figure 1(a) 



we 



0.18 



have fixed 5t = 0.1 and we leave A as a free parameter; whereas in Figure 1(b) we take A 
and we plot P m as a function of the step size. 

The figures show good agreement between predicted and measured acceptance rates. We 
now use eq. ( |2.1[ ) to tune the MD integrator on a larger volume. Ultimately, we are interested 
in reducing the computational cost, which depends on the wall-clock time spent computing the 
force terms as well as the acceptance rate, and the autocorrelation time for the observables. We 
neglect the autocorrelation time in this discussion as they are not sensitive to the choice of integrator 
parameters as long as the acceptance rate is reasonable, and define our cost metric as 



cost 



trajectory CPU time 



For a nested integrator the numerator of this cost function is a function of the number of steps at 
each level times the CPU time required to compute the forces at that level. 



3. Tuning a real simulation 

As an application of our tuning technology, we are going to consider a HMC simulation of 
a 24 3 x 32 lattice with two flavours of Wilson fermions with K = 0.1580 and j8 = 5.6. The 
authors of ^ include two Hasenbusch fields with twisted mass fermions as "preconditioners", and 
they use a nested PQPQP integrator scheme with A = 1/6 and one force term at each level, as 
shown in Table [IJ where is the outermost level. For each level i, we show the corresponding 
number of steps m,, the type of force and its parameters, and typical times spent on force and force 
gradient computation. All times refer to runs on 128 nodes on the BlueGene/L at the University of 
Edinburgh. 
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Level i 


Steps nti 


Force 


F time 


FG time 





3 


Hasenbusch (ju = / jU = 0.057) 


37 s 


43 s 


1 


1 


Hasenbusch (p = 0.057 / /I = 0.25) 


9s 


12 s 


2 


2 


Wilson (jU = 0.25) 


2.5 s 


3.3 s 


3 


3 


Gauge 


0.25 s 


0.34 s 



Table 1: Set-up used in the HMC simulation described in Q, together with typical times spent on force 
computation. For convenience times for the force gradient computation used in §^2 are also shown here. 



For simplicity, we have fixed x = 1 in our tuning exercise. We also have restricted our search 
space to mo > 3 to avoid integrator instabilities (which occur when the BCH expansion breaks 
down). 

3.1 PQPQP tuning 

We now describe how we tuned the PQPQP integrator, and what results we obtained. We 
considered two different nested schemes: 

Al. The original scheme, but with tuned values of A. 

Bl. The two Hasenbusch fields appear now at the same level (so we have only 3 different levels). 

Table |2] shows the parameters which minimize the cost metric defined in the last section. For each 
scheme we show the number of steps at each level, the optimal A parameters, our predictions for the 
acceptance rate, the estimated time spent in force computation in one trajectory, and measurements 
of acceptance rates and trajectory times. For comparison we also show data for the original scheme. 



Scheme 


ntj 

12 3 


h 

12 3 


Prec 


iiction 
F time 
/ traj. 


Measi 

Pace 


irement 
Time 
/traj. 


Original 
Al 
Bl 


3 12 3 
3 112 
3 3 1- 


1/6 1/6 1/6 1/6 
0.185 0.188 0.184 0.183 
0.177 0.183 0.176 — 


0.85 
0.80 
0.82 


655 s 
541 s 
454 s 


0.89 
0.75 
0.83 


709 s 
578 s 
554 s 



Table 2: Tuning of the PQPQP integrator scheme. 

We see that both the tuning of the integrator parameters and changes to the scheme provide 
further improvements over an already well-tuned scheme. Indeed, with the B 1 scheme we get a 
1.3 x speedup. 

3.2 Force gradient integrator tuning 

We have again considered the two integrator schemes above, but used a PQPQP force gradient 
integrator [^] at all levels. As in this case there are no tuneable parameters, we could only vary 
the number of steps at each level. In Table |3| we show the best parameters we found as well as the 
measured values of acceptance rates and trajectory times. 
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Scheme 


nij 

12 3 


Pr 

p 

' acc 


ediction 
F + FG 
time / traj. 


Meas 

^acc 


jrement 
Time 
/traj. 


Original 
A2 
B2 


3 12 3 
3 111 
3 11- 


0.85 
0.96 
0.91 


655 s 
778 s 
565 s 


0.89 
0.91 
0.78 


709 s 
814 s 
626 s 



Table 3: Tuning of the force gradient integrator scheme. 



In this case, the acceptance rates are not sufficiently high to amortize the higher wall-clock 
time cost coming from the computation of the force gradient term. Thus overall, we could see 
no measurable improvement in the cost metric as compared to the regular PQPQP case. However, 
further improvement could be possible by either using other force gradient integrators [0], or tuning 
the Hasenbusch masses. We are currently working on these issues. 

4. Conclusions 

We have presented a novel way of tuning an integrator, together with a practical example 
using a moderate lattice size. This tuning procedure can be used for all lattice gauge and fermionic 
actions. We are working towards a general implementation of the calculation of Poisson brackets 
and force gradient terms in Chroma In the near future we will consider the tuning of HMC 
simulations on larger lattices and smaller quark masses, and we will also consider other widely 
used lattice actions. 
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The numerical results have been obtained using Chroma library 
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